COCOHEIT BESOBE 



ID 090 305 



TH 003 593 



&UTBOB 
TITLE 

HEPOBT HO 
>0B DITB 
BOTE 



EDBS PBICE 
DESCBIPTOBS 



Lr±0 Horris K. 

h Honcentral inalysis of Variance Hodel Belatiog 
Statistical and Practical Significance. 
B-47<|-1 
Apr 7U 

35p.; Paper presented at the Annoal Beeting of the 
Averican Bducatioc Besearcb Association (Chicagor 
Illinois^ April, 1974) 

HP-$0,75 HC-$1«B5 PLOS POSTAGE 

♦Analysis of Variance; Data Analysis; ♦Hypothesis 
Testing; ♦Hathe«aticai^ Hodels; Probability; Research 

P robleas; ^Statistic al A nal ysis; ♦ Test s of 

Significance 



ABSTRACT 

Vhen analysis of variance is us 
significant differences may or aay not be of pr 
to educators. A large part of the problea is da 
••zero difference" noil hypothesis can always be 
statistically if the saaple size is large enong 
Rethod based on the noncentral F distribution i 
differences cannot attain ctatistical significa 
null hypothesis is now rejected at the alpha le 
F exceeds the noncentral F cutoff point where t 
parameter delta (subO) is determined by the Bin 
difference set by the researcher. (Author) 



ed, statistically 
actical significance 
e to the fact that a 

rejected 
b. If, however, a 
s used^ trivial 
nee . The (non-zero) 
vel when the cbserved 
he noncentrality 
iBUB practical 



ERIC 



A HOnCEflTRAL ANALYSIS OK VARIAf^CE MODEL - 
RELATING STATISTICAL AND PRACTICAL SIGNIFICAKCE 

Introduction 

Statement of the Problem 

One of tne most widely used nethods of analyzing research data in 
the behavioral sciences is the analysis of variance (ANOVA) , particularly 
the fixed effects model (Morrison & Henkel , 1969), Integrally tied in 
with this model is the idea of hypothesis testing in the form of tests 
of statistical significance* Of three types of statistical inference-- 
point estimation, interval estimation, and hypothesis testing--behav1oral 
scientists have devoted themselves aliwost exclusively to hypothesis test- 
ing (Heermann & Brasjcamp, 1970). 

Several writers have criticized the current use of ANOVA (Selvin, 
1957; DuBois, 1965; Bakan, 1966; Lykken, 1963; Flelss. 1969; Overall. 
1969). Other writers have suggested that with appropriate corrective 
steps, the basic ANOVA model is an exemplary method of analyzing data and 
obtaining meaningful results (Horst, 1967; Kempthome i Doerfler. 1969; 
Winch & Campbell. 1969). 

Some critics have argued that tests of significance, es done in 
ANOVA. essentially should not be used (e.g. . Morrison i Henkel, 1970); 
however, the pervasive influence of tradition has been recognized 
(Sterling. 1959; Rozeboom, 1960; Lykken, 1968; Heermann i Braskamp, 1970) 
More recently Walker and Schaffarzick (1974) while reluctantly using the 
criterion of statistical significance to compare studies, expressed the 
hope for an Improved methodology. 



It would seem valuable to modify the ANOVA model such that some 
inherent weaknesses including those discussed by the aforementioned 
critics^ are over^ccsne. In particular, it would be desirable to relate 
practical significance mere closely to statistical significance. 

The notion of practical significance is complex in and of itself. 
There is no comnonly accepted method of determining practicality. In 
educational research, where outcomes are not easily described In cost- 
benefit tenns, it is often quite difficult to decide if a difference due 
to treatment is of .educational or practical significance. Nevertneless , 
such assessments of practical significance are being made, and the 
current ANOVA model does not adeqi?ately handle the issue of practical 
significance. 

An analysis of variance model, based on the noncentral F distribution, 
is presented in this paper as an attempt to improve upon the currently 
used ANOVA model, in particular in the area the Inadequate handling 
of practical significance. 



3 



Review of Related Re s e'^rch 

Criticisms of Significance Testing 

The literature critical of significance testing has appeareJ mainly 
in the past 15-17 years (Morrison & Henkel , 1970). Periodically researchers 
have been reminded that statistical significance does not necessarily 
irr^ly practical significance (Selvin, 1957; DuBoi:^ , 1965; Mendenhall , 
1968; Glass & Hakstian, 1969). In essence, what this warning says for 
the AHOVA case is that F tests with their associated p (for probability) 
level of significance are not sufficient means for assessing results. 
Nevertheless, reviewers sometimes use only significance levels when com^ 
paring results from several studies (e.g., Eysenck, lv*60; Bracht, 1970). 

Other authors have treated significant F values as implying sizable 

differences (Guilford, 1956; Mendenhall, 1968). Guilford (1956, p. 275) 

described the ANOVA results of a study: 

The F ratio for machines is significant beyond the .01 point, 
leaving us with considerable confidence that the machine 
differences, as such, have a real bearing upon the difficulty 
of the task. 

Strictly speaking, such a significant F could have resulted where the 
differences were trivial (in the practical sense). The following theorem 
proves that for any predetermined (small) number, a statistically signifi- 
cant 'F (for J = 2) or t ratio can be obtained, but such that the differences 
due to treatment are less than that prectetehnined number. 



4 



Theorem: For any c > o, and o <|7i-X2|<c, there exists an 
such that if sample size N > N^, then t is statistically signi- 
ficant for an ordinary t-test. (In layman's terminology: With 
a large enough sample size, statistical significance is obtain- 

* 

ft>le no matter how trivial the difference in means Is.) 

Proof ; Let c > 0 be given t*ith \J^ - Jm\ *^c% jJJfthqut loss 



of generality, assume: S] « " 5 (Hooogerielty 

ance satisfied), and n^ » ng ■ n (equal'^ell. sf46 

.*.N " + n2 ■ 2n 

Then |t| - (1^1 - XgD/CsCa/n)'*) 

- (n^d^i - X2|5/(s(2)'«) 

Require that n^ > 10s(2*«)/(|I, - 

-> n>200s2/|ir| 'J2\^ 

Therefore, if N - 2n > 400s2/|T-, - Ygl^ 

. • lQs2^ 1^1 - "^21 

tl '• — • — 10 

' ' 1^1 -Xil s/2 

t Is statistically significant - 




then 



Q. E. D. 



Because of the reliance on statistical significance In current 
research niiithodology, a misleaditjg picture appears Iri the litera- 
ture. A classical example was described by Oakan (1966)* Suppose 
Hq (the null hypothesis) is true in the population. Accordingly 
if tests of significance are carried out by lUO independent re- 
sea>j;chers, those 95 (approximately) who do not get statistical 
cignifkance probably vfill not bother to publish their findings. 
The five who do attain statistical significance will be more in- 
clined to publish their findings and they will make Type I errors* 

Part of the misinterpretation of p values is due to a misunder- 
standing of what the probabilities relate to. Camilleri (1962) 
defined three types of probability: (1) Intrins ic probability 
between population variables (e.g. in^a population of scores what 
is the probability, of a score being greater than one population 
standard deviation above the population mean), (2) auxiliary prob- 
ability between a sample and a population (e.g. maximum likelihood 
estimates), and (3) inductive probability relating to the probable 
validity of a hypothesis; that is. scientific Inference. He 
asserted that significance tests have been used for assessiriQ in- 
ductive probability when really they are more appropriate for 
auxiliary probability. 

Morrison & Henkel (1969) argued persuasively that, statisti- 
cally speaking, roost research doe*> not qualify from the standpoint 
of legitimate use of significance tests. They presented the 
following paradigm: 



Type of Population Sampled 

Type of Sampling Technique Specified Unspecified 

Prqbability A B 

Nonprobabili ty C 0 

Thel only legitimate use of significance tests is with studies in Category A. 

Several writers have critii^ized the all-or-none method involved with 

sigr^ificance testing (Rozeboom, 1950; Bakan. 1966; Meehl. 1967). Science 

progresses by adjustments of degree of belief rather than firm decisions, 

Bakan feels that the tests have little if anything to contribute to 

scientific inference. He does agree with Rozeboom that the tests are 

appropriate for making null hypothesis decisions. 

R. A. Fisher (1959 » p, 44) issued a caution about the interpretation 

of significance levels: 

They (tjsts of significance) do not generally lead to any 
probabilUy statements about the real world, but to a 
rational and well-defined measure of reluctance to the 
acceptance of the hypotheses they test 

Accepting the Null Hypothesis 

Some writers have advocated the accepting of H-. If a significant 
Sta.t1st1c is not observed (Walker 4 Lev, 1953; Guilford, 1956; Guenther, 
1964; Kirk, 1968; Glass & Stanley, 1970). Yet statisticians often have 
wamecf against such practices unless the power (probability of rejecting 

when the alternative hypothesis is true) is known (Berkson, 1942; 
Peatman, 1953; Mendenhall, 1968). Cohen (1969), however, has shown 
that ^n typical psychological research, f^er of greater than .90 would 
require larger samples than are usually available. 

To emphasize the inappropriateness of accepting Hq without knowing 
the power,. the proof of a simple theorem is presented whidh states that 



for a given level of significance, there exist normal distributions 
sucfi that the F or t statistic will not be significant, but the size 
^ of the effects will be larger than any predetermined number. 

THEOREM : There exist distributions satisfying the ANOVA assump- 
lions such that the null hypothesis is not rejected, but the means 
differ by more than any pre-given number. (In layman's terminol- 
ogy: If, upon a na.n-s1gnlficant test statistic, you.^aqcept Hq, 
then you may be calUng a huge difference a "zero difference.") 
Proof : (2-sample case) Let e > o be given. Re<ju1re that 
jSTi - jTgl > c. without loss of generality, assume s-j = $3 s 
and « n^ ■ n. 

Then t - (IT^ - \)/%{Ht\)^ - t\\J^ - 

Let s » (7^ - • n^ 

Then t - (n^di - \)/(/2\%^ - • n^) 

« ± .707 (not significant) 
E. d! 

This proof indicates that a researche^^ho accepts Hq may be calling an 
essentially infinite difference a "zero difference." McNemar's (1962) 
suggestion of using three regions (acceptance, suspended Judgment, 2nd 
rejection), depending on the size of the p, does not overcome this objection. 

Conventional Rejection Levels 

The subserviience to using conventional levels (e.g., .01 or .05) 
was criticized over 30 years ago along with the vei7 phrasing of "test 
of significance." (Snedecor, 1942). Uesoite more current warnings about 



8 



the reaciy acceptance of conventional significance levp^s (McNemar, 1962; ' 
Winer, 1962; Slough, 1963; Skipper, Guenther, & Nass, i967; Labovitz. 
1968), the American Psychological Association Publication Manual (V 37) 
advocates the use of asterisks to indicate the various conventional levels. 
OuBois (1968) has cotitended that conventional levels promote objectivity. 

Rosenthal and Gaito reported a "cliff" effect where researchers 
showed a greatest loss of confidence between p=.05 and p«JO (Rosenthal 
& Gaito. 1963). However, a subsequent replication did net find this 
effect (Beauchamp & Maiy, 1964). Both studies noted that students and 
faculty in the field of p *'chological research expressed more confidence 
(degree of belief in research findings) in the sara p values based on ^ 
100 than on 10 cases in the sample, despite the fact that this nieant 
that the smaller sample usually exhibited a larger difference. 

Estimating Sample Size 

Much attention has been devoted to tha estimation of sample size 
with regard to detecting differences as statistically significant. 
Winer (1962) and Cohen (1969) produced tables that are difficult to use. 
A simpler method was presented by Overall and Dalai (1968). Most of 
the ^f,riters in this area have empnasized sample size or power with 
regard to obtaining statistical significance rather than sharpening of 
estimates. For example, Cohen (1969) defined "power" as the probability \ 
that an investigation would lead to statistically significant results. ~ 
Such a definition implies that the purpose of increasing power is to 
increase the probability of obtaining statis'^ical significance. It 
a. so means that absurdities logically follow; for example, a larger 
sample is sometimes deemed less desirable than a smaller one without any 



9 



mention of cost effectiveness (Hays, 1963). If, however, power is 
defined in ^erms of sharpness of estimates rather than ability to detect 
differences as statistically significant; then such an anomaly does not 
arise, for in this case, the larger the -Sample, the better the estimate. 
Furthermore, if the statistical level of significance is related to the 
level of practical significance, then the importance of sample size is 
placed in proper perspective^v^^^^Since a zero null hypothesis can always 
be rejected with a large enough sarrsple under the ordinary ANOVA model 
(see page 4), the decision making depends more on N than on the 
estimation of the parameters. The noncentral ANOVA that is proposed in 
this paper will result in a closer relationship^between estimation and 
decision making. , . 

Confidence Intervals 

' — • . 

The most commonly accepted method of treating practical significance 

statistically is the use of confidence intervals around linear comWjS'- i 

ations of means* The^e .confidenc^intervals are described in most 

educational statistics books, but are offered more as options than as 

reciOTnended and expected procedures (Guilford, 1956f HcNemar, 1962). 

Nevertheless, several jsdicational resje^rcheri are beginning to use con- 

fidence interval procedures— in particular, the post-hoc methods of Tukey 

and Scheff^ (Scheff/, 1959). These post-hoc procedures control the 

Type I error (rejecting a null hypothesis when, in fact, it is true) .for 

all contrasts, and as such are a definite impro^^^t oytM^^^ 

of con^)uting several t- tests v- However,..si^ post-tji^c'' contrasts ^^rf'^^^^'^'^^l^'^''"' 
computed after a statistically si^ificant F test without regai^/or - ^ 
practTcal significarice", much effort maj^^^^ent on putting bounds around 



ERIC 



10 ; 



trivial results. What is neeUed is an F-type test which is significant 
in the statistical sense only if the results are also of practical 
s i gnifi can ceT 

Another confidence Interval approach has been advocated by Rosenkrantz: 

- ■ ■ • ^ 

(1972). The use of "direct confidence intervals" can promote the control 
of the probability of indecisive results and thus provide opportunity 



to stucty the weaknesses of the model under consideration. 
Measures of Association 



Another accepted means of attempting to relate practical signifi- 
cance with statistical significance is the use of measures of association 
like a> (Hays^ 1963), which represent the percent of variance explained 
(Nurfnally, 1960; Duggan & Dean, 1968). These measures, unlike thc^^^p,. « 
values associated with the F test, are relatively independerjt -of sample 
size (Kanried>'. 1970). 



Summary of Related itesearctj^^ 



Teg^^l^ statf5tic"%l=^ignificance, as commonly used, have been 

frequenftly cri-ticized. Suggested im{3troVements have also met with triU 

■ ... - .... , ■ ■■ . . -'^^r / 

particularly because of the contlnue'd lack of relationship between 
statistical' and practical significance. Well known writers have advo- 
cfted inappropriate nj^th^oj 00^^^ ConfJdfe'nce Thtervals and measuV^s of 

%vy^ays of as^^sing practical significance- 
;pl^test that relates statistical ind 




aiis^j&a a t1 cfe,:wer6 seezi 



t\a*Vl s a 1 so ne^de d 



practical srig^ficiai^ 



ERLC 





n 



Noncentral Analysis of Variance 

Related Research and Theory 

ANOVA's peculiar characteristic of sometimes resulting in statistical 
but not practical significance leads to the following Situation: a 
statistical rejection of the null hypothesis can coincide with (1) a 
practical difference, or (2) ^ trivial difference. 

Some researchers regard levels of significance as indications of 
the degree of certainty in the results. This certainty, however, refers 
to the probability that the true difference is not exactly zero. It 
does not refer to the size of the difference. With a large enough sample, 
a difference can be mihiscule and yet the p value could easily imply 
that it is- very certain that the true difference is not exactly zero. 
Consider also the following example: 

= 7, n^ = 2, = 2, ng - 2. Sp^^^^^ = 1. 



Then t = - l2^/S a ^^^^^ ^ 



^ -s. 



if the low p value (p < .05). obtained is used as a measure of certainty, 
then this example shows a case where one would be "certain" about the 
results based on a sample of only four subjects. 

The most cormionly accepted solution to the statistical-practical 
significance dilenma seems to be one of first ascertaining statistical 
signific$?^ce and second assessing the practical significance of any 
statistically significant results. In essen_ce, the statistical test 
doesr not necessarily match up with the practical one. 

Since practicality often is assessed pos^ hoc (after the statistical 
test), it Is reasonable to ask for an a priori [before the test) 



12 



assessment so that a more appropriate null hypothesis can be used. The 
unquestioning acceptance of always using a zero difference null hypothesis 
has been criticized by several writers (Grant, 1962; Kerlinger, 1964; 
Cohen, 1969). 

For the two sample t-test, it has been suggested (Dixon & Massey, 
1969; Pena, 1970) that if d represents a practical difference, then the 
test statistic is . 

t - (^1 - \)-<i 

In this case the use of the ordinary t statistic would amount to asking 
the wrong question. Instead of asking whether there is a difference at 
all, researchers usually should^be asking whether or not there Is an 
educational or practical difference. Instead of asking whether a Datsun 
gets better mileage than a Cadillac, we should be asking how many more 
gallons a Datsun gets and whether this difference is of practical impor- 
tance. 

Using Dixon and Massey 's model, if a researcher obtains a statisti- 
cally significant difference then it will also be of practical significance 
(i.e., greater than the preassigned value of d) . Basically this procedure 
results in the test of the appropriate (non-zero) null hypothesis. 

As indicated earlier, analysis of variance needs a similar procedure 
since trivial differences may be statistically significant, and Tukey's 
or Scheffe'*5 confidence interval procedures (Scheffe"^ 1959) would merely 
be putting bounds around trivial differences. Fortunately, the noncentral 
parameter'^^'af^e noncentral F distribution provides an analog to the d 
used in the t statistic just described. Again if the minimum practical 



13 



difference Is greater than zero, then use of the ordinary F test amounts 
to asking the wrong question. 

Once a researcher has determined what constitutes a practical 
difference, then the next problem is to associate this difference with 
the appropriate noncentrality parameter. If this 6 is correctly deter- 
mined, then the new model guarantees that statistical significance will 
be related to practi^l significance. The Influerirce of sampU ss|^e on 
the F value Is no lOTger a problem^since as H increases, 6 also Increases 
in such a way that the critical F value Is automatically adjusted upwards 
to compensate for the Increase In the F statistic due to the larger 
sample size. 

Estimating the Noncentrality Parameter Associated with a Practical 
Difference 

Kirk (196a} defined the ncncentrality parameter' as 

■ . Where J « nuraber of treatments 

nj = number of subjects in the jth treatment 

• mean. of the jth treatment 
11 » grand mean , . 
ag^ <= error variance 

The noncer^trality parameter expresses the size of the effects in 
terms of the sdifferences between the various group means and the grand 
mean. When one speaks ;-0f practical differences, he is usually referrihg 
to the differences between treatments rather than the difference between 
each treatment and the overall mean. Of course, if a^^researcher-^ould 
relate what he considers a pract^cafidl/ference with the squared ' 

Vang's (193a| classic on power defined the noncentrality parajtieter as 



14 



differences between the group means and the grand mean, then he could 
directly substitute into the formula for 6. and thus determine the non- 
central ity parameter associated with a practical difference. 

Since, however, differences among treatments is the more conmon 
approach, these differences will be related to 6 by the following 
theorem : 




Gini (umiated) proved a similar theorem for the relationship 



where J » ntniber of treatments, Jii « mean of 1th treatment, 
= /II •■ ^ - , where jui.. » ( ^ M-'*) / T. 



Proof : 

--(/<r — ^"t — ; + . . . W/^r - — T — y 

|2' !»• ,>j ' ' ' 

* ^ • - 



y 1 J" t 



- fx/- 2^,><. t (yM^'ZM.^^^^l)-*...-!- 



Q. E. D. 



16 



The main purpose of this theorem is to enable the researcher 
to relate practical differences, which can be expressed in terms 
of (mj - to which is a function of (wj - p.J^. It woulo 
be difficult If the researcher had to translate practical differ- 
ences in terins of (mj m..). 

Given K treatments let M « the cniniimjm practical average 
difference between treatments, expressed in tenns of absolute 
values. 

^ I I average of pairwise differ- 

— 7T\ ^"^^ (absolute values) 

(i) 

Consider the case where^ cell sample sizes are^ equal. Since 
J* r f h .£ ^j^)/ ^ and the previous theo^^cn showed that 

then a reasonable tr^al substitution for j|j is M 

The square of the average of the pairwise differences is to be 

substituted for the average of the (differences)^. So we have 

For J « 3, we have ^ « ^ r 

Besides using the mimmum practical average M, 'it is also 
possible to substitute the individual niinimun pairwise differences 
if these can be stated by the researcher — In addition, orthogonal 
a-pr1ori contrast^ may be performed with a 6 being determined by 
the particular contrast. Post-hoc contrasts like Scheff^'s can 
still be used as currently practiced since they control the Type I 



17 



error for all contrasts, independent of the truth of the null hypothesis 
(Scheffe^ 1959). 

Let R = « ■ a measure of how good an approximation results from 

using in place of % ^\ ^ using several sample sizes, means, 

J.I 

and variances, a computer program was used to compute several such ratios. 
R turns out to be a function of the relative distances between the means; 
for example, the lowest ratio (and therefore, the best estimate) Wc^s 
obtained when the n^ans were equally distant (e.g., vi * 7» * ^3 * 
The worst estimate occurred when two means were as far away frm the third 
means as possible (e.g., y] » 0, 113 M3 « 15). In between, the 

ratio was exactly determined by the variable e = ja - tj|/(a ^ b) where 
a « Im] - P2I f b » ^3! ^1 1 ^2 - ^3' Accordingly 6 can be 

readily determined by multiplying I by R. 



18 



Figure 1: R in Tenns of Position of Means 



« >K ^ > | 

e » U-b|/(a+b) 

H 1 1 



Special cases: 

1) e « 0 »> R » 1.13 (best estimate) 

« h ""''^ 

' H ^ M e « |a-b|/(a+b) 

= 0/(a+b) " 0 

-H —I j — 

M5 



11) e » 1 -> R » 1.5 (worst estimate) 



^ e » |a-b|/(a+b) 

= a/a 
= 1 



H — 

Mi 



Figure 2: R » 52/32 a Function of 



1.50- 

l.fO- 
J.35- 
1.30- 

120- 
U5- 
UO 



+ 



1 .2 .3 .5 .6 .7 .5 .9 



ERIC 



20 



Perforainq the Noncentral F Test * 
Once the a-priori practical difference is used to determine 

«o, an ordinary F test is perfonoed to test the hypotheses: 
Hq : 6^ <^ 62 (there is no practical difference) 
H] : 6^ > j2 (there is a practical difference) 
Instead of rejecting Hq when the observed F > F^^, {)-<i), 

now Hq is rejected when the observed F > F'v^ . *o 

noncentral F cutoff point. 

Patnaik's Approximation of Noncentral F 

The noncentral f distribution (denoted by F') has been tabled 
only partially (Johnson S Wei th, 1939; Barton. David, & O'Neill, 
1960; Severe & Zelen, 1960; Tiku, 1966). Unlike Central F, F' 
cannot, in general, be expressed in closed fom (Wishart, 1932; 
Price, 1964K A r*casonably complete F' table would probably be 
too unwieldy for practical use (there are 389 pages in Resnikoff 
and Liebennan's Tables of the HQf>-central t-distribution . 1957). 
Of the several approximation procedures developed, Patnaik's (1949) 
seems the most usable since it utilizes the already available and 
familiar central F tables. 

Although Patnaik's method involves laborious computation 
(Feldt & Malmoud, 1958; Grubbs, Coon, & Pearson, 1966), the result- 
ing formulas for the fixed effects ANOVA #ase are relatively simple 
The accuracy of Patnaik's approximation has been verified in sev- 
eral studies (Pearson, 1952; Tukey, 1957; Sankaran, 1963; Seber, 
1963). A brief outline of the method appears in the Appendix to 
Scheffe's The Analysis of Variance (1959). 



21 



Derivation of Patn&lk's Approximation for AWOVA ^ 
It can shown that E (Xy,s) s ifti &n<i variance 

£ ^ ^ cT . where /C » noncentra l 

"J^ Vflth noncentrality parameter SAfini^]/frl (Scheffe, 1959)'' 

A possible approximation of X*', ^ \% c,lL ;7. 

Equating means and variances of the two distributions, we get 

cVct/tS^ (since and cV^/^Jr Zi/-^ f / 

=^ c^u 1/ 4- I since var (X?) • Solve for c and V . 



/(c-hO/Cc-i) = Zif -t 

Noncentral F can be considered as (U^/v^ )/ (02/^2) **here is A*/,,/ 
and U2 is 

where M ^ TCJ- iW^-^ 



Scheffe's problea IV.4 Ws an error In it. The expression , 
Pr (i i (T^'^Xl-^jSj'^ read Pr it£(%'^)(ii- ^) 



22 



With the formulas for and Fvi* v^* 6 a noncentral analysis 
of variance can now be performed. In the following Illustrative 
exanple, notice that the observed F statistic would be significant 
for an ordinary AWOVA. 

Illustrative Exaaple 
Noncentral ANOVA: 
d-3 N-60 n-20 a^^'ZS 

1. Researcher states that the average difference between pairs 
of treatment must be greater than 10 in order for there to 
be a practical slgnificancei 

A 

2. The sample means are 38.9, 51.6, 53.3; F « 47.8 



3. 



(iTi) K 

Since F < 93.5, do not reject "Hq: There is no practical differ- 
ence." 

Monte Carlo Test of Noncentral ANOVA 
Rationale ; 

I, 

The CAL DEVIATE computer program (Hutchinson, 1967) was used 
to generate pseudorandom samples from three normal populations 



Z3 



J'' 



idth the following paraBieters: « 40, ^2 " ^> ^3 " 
0^2 „ , , 25; » « » 20. The Jtonte Carlo test 
for noncentral ANOVA is based onVthe following rationale: 
(1) Suppose a researcher has been able;^^ a minimum 



average prattled 1 diffeVence amon^v:^|^$^^^ groUps, Aji£t|jpd 
described earlier relates this functlonaTly to 6^ where the 
researcher wants a statfstlcally significant result to imply that 
the average dlfferejices among^ groups are suqh that > «q. (2). If 
the populations are set up such that t^^Bpulation 6 « 6q, then 
a verification of the model would require that for a Type I error 
rate of a» 100a% of the time the observed F would exceed the > 



tabled or compute<lnoncentral F.. Figure 3 presents pictorial ly 



what is happening y») 




Figure 

Rejection Region for Noncentral ANOVA 



Reject Hq': 6^ » 
if the observed F 
is greater than 
F"v^, vg, 60 (1 - a) 




Notice that if the ordinary ANOVA were used, a statistically 
significant result wuld occur i^hinwre often than lOOaS of the 
time even though the true differences are not sufficient to be of 



One hundred analyses of variance were run using the BHDOIV 



meeting the criterion of having practical difference. In ANOVA 
'^liypothesis testing language: 

Hq! «^ 1 there is no practical difference 
H^: 6^ > fi^o there is a. practical difference 
where ^^pop ^o^- Since J (sample mean) is an unbiased estimate 
of r (population mean), the grand means of the three entire 
samples generatedare used in calculating «^pop* Each repre- 



sents the mean of all the data generated f rom theyth population. 
« 39.95, « 50.44, C3 » 54,99 ^"v/ 



practical significance. 



Procedure; 



program. The population is set up so that ft barely misses 




y.. » 48.46 « grand mean of all the samples combined 
a] « 6.53, a2 ^ 1.98, 03 «-B.51 {'^^ « Mi - 

« 42.64, 3.92, 03^ ^ 72.25 « ^ oc] = 118.81 




9 



25 



These foraulas were derived earlier in this paper. 

p „ T ^ MxM , 4S.42. 0, , (48.42)2 

i-p I i- 2 ^ 2(48.42) - 1 ^^'^ 

F'v^. vg. i (1 - a) = (48.42) F24.5.57 (1 - a) 

« 1X)2.65 for B «"701 

= 82.31 for a ' .05 

= 73.11 for p = .10 
Since 100 samples were run, approximately 1 F value should 
exceed 102.65, about 5 should exceed 82.31, and about 10 should 
exceed 73 . 1 ! . 

Table 1 compares the expected with the actual number of 
F values exceeding the various cut-off points. 



/ Table 1: 

Suninary of Monte Carlo Test of Nonceritral ANOVA 

Expected number Actual number 

exceeding cut off exceeding cut off 

.01 1 0 

.05 5 8 

.10 10 11 



The misfit for a =• .05 is not as bad as it seems, since of 
the 8 exceeding 82.31, three were barely above that value (82.36, 
82.56» and 82.89). 

Sunmary 

The Monte Carlo test, in general, verified that the non- 
central ANOVA procedure is operating at near the appropriate 
Type I error rate. Notice that an ordinary ANOVA procedure would 



26 



have yielded 100 statistically significant results where the 
population has Imposed upon It the characteristic of no practical 
significance. 



Simmary and Discussion 

Some of the cocnnon Inappropria :e uses of the traditional analysis 
of variance and also the shortcomings inherent in the ANOVA model Itself 
have been described. The ANOVA model was modified to integrate practical 
significance with statistical significance. The modified version, based 
on the noncentral F distribution, included procedures for estimating the 
required noncentral ity parameter 5, given that the researcher can state 
a priori what constitutes a minimum practical difference among the group 
means. 

This proposed noncentral ANOVA would seem to be an improvement over 
ANOVA in several aspects: 

1. No longer can trivial (in the educational sense) results 
attain statistical significance.. Hence, the illogic of the concept 
of "too large a sample" does not exist apart from cost effectiveness. 

2. The researcher is forced to relate numerical scores with 
practicality instead of analyzing scores in and of themselves- 

3. Post-hoc contrasts (e.g., Scheffe'^, Tukey) are computed only 
around non-trivial (in the educational sense) ^results . 

4. A statistical rejection can no longer be followed by tivo 
contradictory outcomes. In ordinary ANOVA, statistical significance 
can go with (1) no practical significance or (2) a practical difference. 
With noncentral ANOVA, statistical significance goes only with practical 
significance because the appropriate hypothesis is being tested. 

If noncentral ANOVA becomes widely used,' it would be desirable 
to have easily used noncentral F tables where a researcher need only 



'28 

. / 

/ 

specify v^, v^, and 6 to obtain the corresponding noncentral F value. 

The various partial tables now in existence are geared mainly for 

power calculations and not readily usable (e.g.. Tang, 1938; Cohen, 1969) 

The overall importance of the procedures pi^sented in this study 
is the bringing together of statistics and practicality. This synergism 
enables not only more meaningful presentation of results, but also the 
powerful use of statistics in a complementary rather than' ritualistic 
way. 

^The use of noncentral ANOVA can improve the quality of data analysis 

while at the same time be straightforward enough for understanding and 

use by practitioners. E. S. Pearson (1938, p. 471) aptly described the 

liT^ortance of noncomplex concepts for users: .^^ 

If the object of the mathematical statistician is to provide 
tools for practical use. It seems important that the connexion 
between the abstract and the perceptual should be expressible 
in terms of the simplest possible probability concepts. 

Noncentral ANOVA would seem to meet this criterion while at the same 

time provide a means of eliminating some of the crucial shortcomings 

of the currently used ANOVA model. 



29 



BIBLIOGRAPHY 



American Psychological Association. Publicati on Manual . Washington, 
DX.: APA. 1967. 

Sakan, 0. ' The test of significance in psychological research. Psycho- 
logical Bulletin , 1966, 66. 423-437. 

Barton, D. E.. David, F. N.. & O'Neill, A. F. Sotne properties of the 
distribution of the logarithm of non-central F. Blometrika , 1960, 
47, 417-431. 

Beauchamp, K. L., & May. R. li. Replication report: Interpretation of 
levels of significance by psychological researchers. Psychologi cal 
Reports > 1964, 14(1), 272. 

Berkson, J. Tests of significance considered as evidence. Joumal of 
the American Statistical Association , 1942, 37. 325-335. 

Bracht, J. Experimental factors related to e^ptitude-treatment interaction. 
Review of Educational Research, 1970, 40(5), 627-645. 

Camilleri, S. F. Theory, probability, and induction ii» social research. 
A merican Sociological Review , 1962, 27, 170-178. 

Cohen, J. Statistical Power Analysis for the Behavioral Sciences . 
New YoHl Academic Press , 1969. 

Dixon, W., i Massey, F., Jr. Introduction to Statistical Analysis . 
New York: McGraw-Hill, 

DuBois, P. H. An Introduction to Psychological Statistics . New York: 
Harper & Row, 1965. 

Uuggan, T. J., & Dean, C. W. Common misinterpretations of significance 
levels in sociology journals. The American Sociologist , 1968, 2» 
45-46. 

Eysenck, H. J. The concept of statistical significance and the controversy 
about one-tailed tests. Psychological Review , 1960, 67^(4), 269-271., 

Feldt, L. S., & Manmoud, M. W. Power function charts for specifications 
of sample size 1n analysis of variance. Psychometrika , 1958, 23(3), 
201-210. ^ ' 

Fisher, R. A. Statistical Methods and Scientific Inference . New York: 
Hafner, 1959. 

Fleiss, J. Estimating the magnitude of experimental effects. 
Psychological Bulletin , 1969, 72, 273-276, 



30 



Gini, C. W. Vanabillta e mutabilita, contrlbuto alio studio delle 
distribuzioni e relazioni statische studi Economico-Giorldlcl 
della R. Universita d1 Cagliari, undated. 

Glass, 6., & Hakstian, A. Measures of association in comparative 
experiments: Their developri)ent and interpretation. Ameri can 
Educational Research Journal , 1969. 6, 403-414. 

Glass G. v., 4 Stanley, J. C. Statistical Methods in Education and 
Psychology . Englewood Cliffs, N.J.: Prentice-Hall , 1970. 

Grant, D. A. Testing the null hypothesis and the strategy and tactics 
of investigating theoretical models. Psychological Review , 1962, 
69,, 54-61. 

Grubbs, F. E., Coon, H. J., & Pearson, E. S. On the use of PatnaiSc 
type chi approximations to the range in significance tests. 
Blometrika . 1966, 53, 248-252. 

Guenther, W. C. Analysis of Variance . Englewood Cliffs, N.J.: 
Pr«ntice-Hall, 1964. 

Guilford, J. P. Fundamental Statistics in Psychology and Education. 
New York: McGraw-Hill , 1956. 

Hays, W. Statistics . New York: Holt, Rinehart, & Winston, 1963. 

Hennnann, E. F., & Braskamp, L. A. Readings in Statistics for the 

Behavioral Sciences . Englewood Cliffs, N.J.: Prentice-Hall, 1970. 

Horst, P. Psychological Measurement and Prediction . Belmont, California: 
Wadsworth, 1966. \ - 

Hutchinson, D. CAL DEVIATE . Computer Center, University of California, 
Berkeley, 1967. 

Johnson, N. L., & Welch, B. L. Applications of the non-central t- 
distribution. Bicmetrika . 1939, 31, 362-389. 

Kempthome, 0., & Doerfler, T. E. The behavior of some significance 
.tests under experimental randomization. Biometrika , 1969, 56 , 
231-247. 

Kennedy, J. J. The eta coefficient in complex ANOVA designs. Educational 
and Psychological Measurement , 1970, 30, 885-889. 

Kerlinger, F. Foundations of Behavioral Research. New York: Holt, 
Rinehart, & Winston, 1964. 

Kirk, R. Experimental D e sign: Procedures for the Behavioral Sciences . 
Belmont, Callfonda": Brooke/Cole, 1968. ^ ' ■ 



'ft 



31 



Labovitz, S. Criteria for selecting a signif cance level: A note on the 
sacredness of The American Sociologist , 1968. 3, 220-222. 

Lykken, D, Statistical significance in psychological research. 
Psycliological Bulletin , 1968, 70, 151-159. 

McNemar, Q. Psychological statistics . New York: John Wiley & Sons, 
1962. 

Meehl, P. Theory testing in psychology and physics: A methodological 
pareidox. Philosophy of Science , 1967, 3j4, 103-115. 

Mendenhall, W. Introduction to linear models and the design and analysis 
of experiments . Belmont, California: Wadsworth, 1968. 

Morrison, D. E., & HenkeK R. E. Significance tests reconsidered. The 
American Sociologis t , 1969, 4, 131-140. 

Morrison, D. W,, &Henkel, R, E. The significance test controversy . 
Chicago: Aldine, 1970. 

Nunnally, J. The place of statistics in psychology. Educational and 
Psychological Measurement , 1960. 20, 641-650. 

Overall, J. Classical statistical hypothesis testing within the context 
of Bayesian theory. Psychological Bulletin , 1969, 71^» 285-292. 

Overall, J., A Dalai, S. N. Empiricai formulae for estimating appropriate 
sainple sizes for analysis of variance designs. Perceptual and Motor 
Skills , 1968, 27(2), 363-367. 

Patnaik, P. B. The noncentral and F-distributions and their approxi- 
mations. Biometrika , 1949, 36, 202-232. 

Pearson, E. S. Comparison of two appr^imations to the distribution of 
the range in small samples from normal populations. Biometrika , 
1952, 39, 130-136. 

Pearson, E. S. Note on Professor Pieman's contribution to the theory of 
estimation. Biometrika , 1938. 30»-^7V474. 

Peatman, J. Introduction to applied statistics . New York: Harper & 
Row, 1963, - ~" 

Pena, D. A significant di-fference of opinion with the Coats position. 
Educational Researcher , 1970, 11, 9-10: 

Price, R. Some non-central F distributions expressed in closed form. 
Biometrika , 1964, 5^, 107-122. 

Resnikoff, 6. J., & Liaberman, G. J. Tables^of the non-central t- 
distributlon. Stanford: Stanford University Press, t9b7. 



32 



y--^P 

Rosenkrantz, R. The significance test controversy. Educational Researcher , 
1972. 1(12), 10-14. 

Rosenthal, R., & Gaito, J. The interpretation of levels of significance 
by psychological researchers. Journal of Psychology , 1963, 55(1) 
33-38. 

Rozeboom, W. The fallacy of the. null hypothesis significance test. 
Psychological Bulletin , 1960, 57, 416-428. 

Sankaran, M. Approximations to the non-central chi-squaire distribution. 
Bicmetrika , 1963, 50, 199-204. 

Scheffe, H. The analysis of variance . New York: John Wiley & Sons, 1959. 

Seber, G. The non-central chi -squared and beta distributloris . Biometrika , 
1963, 50, 542-544. 

Selvin, H. C. A critique of tests of significance in survey research. 
Ajnerican Sociological Review , 1957, 22, 519-527... 

Severo, N. C, & Zelen, M. Normal approximation to the chi-square and 
non-central F probability function. Biometrika , 1960, 47, 411-416. 

Skipper, J. K., Guenther, A. C. & Nass, G. The sacredness of .05: A 
note concerning the uses of statistical levels of significance in 
social science. The American -Sociologist , 1967, 1, 16-18. 

Slough, D. A. Experimental precision and tests of hypotheses. Psycho- 
logical Record , 1963, 13(2). 221-226. 

Snedecor, G* W. The use of tests of significance in an agricultural 
experiment station. Journal of the American Statistical 
Association , 1942, 37, 383-386. 

Sterling, T. D. Publication decisions and their possible effects on 
inferences drawn from tests of significance— or vice versa. 
Journal of the American Statistical Association , 1959, 54, 30-34. 

Tang, P. C. The power function of the analysis of variance tests with 
tables and illustrations of their use. In J. Neyman, & E. S. 
Pearson (Eds.). Statistical Research Memoirs , Vol. II. London: 
University of London, 1938. 

Tiku, M. L. A note on approximating to the non-central F distribution. 
Biometrika , 1966, 53, 606-610. 

Tukey, J. W. Approximations to the upper 5 percent f^bints of Fisher's 
g distribution and non-central X^. Bi ometrika , 1957, 528-530. 



33 



Walker, D. F., & Schaffarzick, 0. Coniparing curricula. Review of 
Educational Resarch . 1974, 44(1), 83-111. 

Walker, K. M., S Lev, J. Statistical Inference . New York: Henry 
Holt, 1953. 

Winch, R. F., & Campbell, D. T. Proof? No. Evidence? Yes. Ihe 

significance of tests of significance. The American Sociologist , 
1969, 4, 140-143. 

Winer, B. J. Statisti cal p rinciple s in experi menta l design. New 
York: McGraw-Hill, 1962. 

Wishart, J. A. A note on the distribution of the correlation ratio. 
Blometrlka, 1932, 24, 441-456. 



